74 research outputs found

    Scene understanding : Toward a Safer Navigation

    Get PDF
    This speech is a global discussion about perception and scene understanding for reaching a safer navigation

    Intégration de la saillance visuelle dans la reconnaissance d’évènements rares

    Get PDF
    This paper presents a new method for the detection of rares events in video. It is based on the visual saliency and on the detection and local description of points of interest. The point-of-interest filtering is carried out using the saliency score, allowing only those with visual importance to be considered. A model of normal events is learned thanks to the probabilistic generative model "Latent Dirichlet Allocation" (LDA), known for its performance in textual data mining. The detection of an abnormal or rare event is carried out in a probabilistic way via the learned model. This paper proposes to combine a saliency based visual focalization and the use of automatic document classification technic in order to classify images from a video and to detect rare events

    FIT-SLAM -- Fisher Information and Traversability estimation-based Active SLAM for exploration in 3D environments

    Full text link
    Active visual SLAM finds a wide array of applications in GNSS-Denied sub-terrain environments and outdoor environments for ground robots. To achieve robust localization and mapping accuracy, it is imperative to incorporate the perception considerations in the goal selection and path planning towards the goal during an exploration mission. Through this work, we propose FIT-SLAM (Fisher Information and Traversability estimation-based Active SLAM), a new exploration method tailored for unmanned ground vehicles (UGVs) to explore 3D environments. This approach is devised with the dual objectives of sustaining an efficient exploration rate while optimizing SLAM accuracy. Initially, an estimation of a global traversability map is conducted, which accounts for the environmental constraints pertaining to traversability. Subsequently, we propose a goal candidate selection approach along with a path planning method towards this goal that takes into account the information provided by the landmarks used by the SLAM backend to achieve robust localization and successful path execution . The entire algorithm is tested and evaluated first in a simulated 3D world, followed by a real-world environment and is compared to pre-existing exploration methods. The results obtained during this evaluation demonstrate a significant increase in the exploration rate while effectively minimizing the localization covariance.Comment: 6 pages, 6 figures, IEEE ICARA 202

    Instance Sequence Queries for Video Instance Segmentation with Transformers

    Get PDF
    Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1)~building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. {On TITAN Xp GPU}, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset

    A Review of Visual-LiDAR Fusion based Simultaneous Localization and Mapping

    Get PDF
    Autonomous navigation requires both a precise and robust mapping and localization solution. In this context, Simultaneous Localization and Mapping (SLAM) is a very well-suited solution. SLAM is used for many applications including mobile robotics, self-driving cars, unmanned aerial vehicles, or autonomous underwater vehicles. In these domains, both visual and visual-IMU SLAM are well studied, and improvements are regularly proposed in the literature. However, LiDAR-SLAM techniques seem to be relatively the same as ten or twenty years ago. Moreover, few research works focus on vision-LiDAR approaches, whereas such a fusion would have many advantages. Indeed, hybridized solutions offer improvements in the performance of SLAM, especially with respect to aggressive motion, lack of light, or lack of visual features. This study provides a comprehensive survey on visual-LiDAR SLAM. After a summary of the basic idea of SLAM and its implementation, we give a complete review of the state-of-the-art of SLAM research, focusing on solutions using vision, LiDAR, and a sensor fusion of both modalities

    A very simple framework for 3D human poses estimation using a single 2D image: Comparison of geometric moments descriptors.

    Get PDF
    In this paper, we propose a framework in order to automatically extract the 3D pose of an individual from a single silhouette image obtained with a classical low-cost camera without any depth information. By pose, we mean the configuration of human bones in order to reconstruct a 3D skeleton representing the 3D posture of the detected human. Our approach combines prior learned correspondences between silhouettes and skeletons extracted from simulated 3D human models publicly available on the internet. The main advantages of such approach are that silhouettes can be very easily extracted from video, and 3D human models can be animated using motion capture data in order to quickly build any movement training data. In order to match detected silhouettes with simulated silhouettes, we compared geometrics invariants moments. According to our results, we show that the proposed method provides very promising results with a very low time processing

    3D Human Poses Estimation from a Single 2D Silhouette

    Get PDF
    This work focuses on the problem of automatically extracting human 3D poses from a single 2D image. By pose we mean the configuration of human bones in order to reconstruct a 3D skeleton representing the 3D posture of the detected human. This problem is highly non-linear in nature and confounds standard regression techniques. Our approach combines prior learned correspondences between silhouettes and skeletons extracted from 3D human models. In order to match detected silhouettes with simulated silhouettes, we used Krawtchouk geometric moment as shape descriptor. We provide quantitative results for image retrieval across different action and subjects, captured from differing viewpoints. We show that our approach gives promising result for 3D pose extraction from a single silhouette

    Deep Learning Based Traffic Signs Boundary Estimation

    Get PDF
    In the context of autonomous navigation, the localization of the vehicle relies on the accurate detection and tracking of artificial landmarks. These landmarks are based on handcrafted features. However, because of their low- level nature, they are not informative but also not robust under various conditions (lightning, weather, point-of-view). Moreover, in Advanced Driver-Assistance Systems (ADAS), and road safety, intense efforts have been made to implement automatic visual data processing, with special emphasis on road object recognition. The main idea of this work is to detect accurate higher-level landmarks such as static semantic objects using Deep learning frameworks. We mainly focus on the accurate detection, segmentation and classification of vertical traffic signs according to their function (danger, give way, prohibition/obligation, and indication). This paper presents the boundary estimation of European traffic signs from an embedded monocular camera in a vehicle. We propose a framework using two different deep neural networks in order to: (1) detect and recognize traffic signs in the video flow and (2) regress the coordinates of each vertices of the detected traffic sign to estimate its shape boundary. We also provide a comparison of our method with Mask R-CNN [1] which is the state-of-the-art segmentation method

    A Review of Environmental Context Detection for Navigation Based on Multiple Sensors

    Get PDF
    Current navigation systems use multi-sensor data to improve the localization accuracy, but often without certitude on the quality of those measurements in certain situations. The context detection will enable us to build an adaptive navigation system to improve the precision and the robustness of its localization solution by anticipating possible degradation in sensor signal quality (GNSS in urban canyons for instance or camera-based navigation in a non-textured environment). That is why context detection is considered the future of navigation systems. Thus, it is important firstly to define this concept of context for navigation and to find a way to extract it from available information. This paper overviews existing GNSS and on-board vision-based solutions of environmental context detection. This review shows that most of the state-of-the art research works focus on only one type of data. It confirms that the main perspective of this problem is to combine different indicators from multiple sensors

    Recursive linearly constrained Wiener filter for robust multi-channel signal processing

    Get PDF
    This article introduces a new class of recursive linearly constrained minimum variance estimators (LCMVEs) that provides additional robustness to modeling errors. To achieve that robustness, a set of non-stationary linear constraints are added to the standard LCMVE that allow for a closed form solution that becomes appealing in sequential implementations of the estimator. Indeed, a key point of such recur- sive LCMVE is to be fully adaptive in the context of sequential estimation as it allows optional constraints addition that can be triggered by a preprocessing of each new observation or external information on the environment. This methodology has significance in the popular problem of linear regression among oth- ers. Particularly, this article considers the general class of partially coherent signal (PCS) sources, which encompasses the case of fully coherent signal (FCS) sources. The article derivates the recursive LCMVE for this type of problems and investigates, analytically and through simulations, its robustness against mismatches on linear discrete state-space models. Both errors on system matrices and noise statistics uncertainty are considered. An illustrative multi-channel array processing example is treated to support the discussion, where results in different model mismatched scenarios are provided with respect to the standard case with only FCS sources
    • …
    corecore